Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce new utilities for writing Alpaka kernels #43205

Merged
merged 5 commits into from
Nov 15, 2023

Conversation

fwyzard
Copy link
Contributor

@fwyzard fwyzard commented Nov 6, 2023

PR description:

Introduce four new utilities for writing Alpaka kernels:

  • blocks_with_stride(acc, size)
  • elements_in_block(acc, block, size)
  • once_per_grid(acc)
  • once_per_block(acc)

Simplify the unit tests, and extend them to cover the newly introduced functionality.


blocks_with_stride

blocks_with_stride(acc, size) returns a range than spans the (virtual) block indices required to cover the given problem size.

For example, if size is 1000 and the block size is 16, it will return the range from 0 to 62 (63 blocks of 16 elements covers 1008 elements, enough for a total size of 1000).
If the work division has more than 63 blocks, only the first 63 will perform one iteration of the loop, and the other will exit immediately.
If the work division has less than 63 blocks, some of the blocks will perform more than one iteration, in order to cover then whole problem space.

All threads in a block see the same loop iterations, while threads in different blocks may see a different number of iterations.

elements_in_block

elements_in_block(acc, block, size) returns a range that spans all the elements within the given block. Iterating over the range yields values of type ElementIndex, that contain both .global and .local indices of the corresponding element.

If the work division has only one element per thread, the loop will perform at most one iteration.
If the work division has more than one elements per thread, the loop will perform that number of iterations, or less if it reaches size.

once_per_grid

once_per_grid(acc) evaluates to true for a single thread within the kernel execution grid.

Usually the condition is true for block 0 and thread 0, but these indices should not be relied upon.

once_per_block

once_per_block(acc) evaluates to true for a single thread within the block.

Usually the condition is true for thread 0, but this index should not be relied upon.

PR validation:

The updated unit tests compile and pass.

If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:

If the master branch is moved to CMSSW_14_0_X, this PR will be backported to CMSSW_13_3_X.

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 6, 2023

enable gpu

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 6, 2023

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 6, 2023

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43205/37529

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 6, 2023

A new Pull Request was created by @fwyzard (Andrea Bocci) for master.

It involves the following packages:

  • HeterogeneousCore/AlpakaInterface (heterogeneous)

@fwyzard, @makortel can you please review it and eventually sign? Thanks.
@missirol, @makortel, @rovere this is something you requested to watch as well.
@sextonkennedy, @rappoccio, @antoniovilela you are the release manager for this.

cms-bot commands are listed here

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 6, 2023

-1

Failed Tests: RelVals-GPU GpuUnitTests
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-139b5e/35643/summary.html
COMMIT: 1798b10
CMSSW: CMSSW_13_3_X_2023-11-06-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/43205/35643/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals-GPU

  • 12434.58712434.587_TTbar_14TeV+2023_Patatrack_AllTripletsGPU_Validation/step2_TTbar_14TeV+2023_Patatrack_AllTripletsGPU_Validation.log

GPU Unit Tests

I found 1 errors in the following unit tests:

---> test alpakaTestKernelCudaAsync had ERRORS

Comparison Summary

Summary:

  • You potentially added 10 lines to the logs
  • Reco comparison results: 136 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3363010
  • DQMHistoTests: Total failures: 1790
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3361198
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 214 log files, 167 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 6, 2023

GPU Unit Tests

I found 1 errors in the following unit tests:

---> test alpakaTestKernelCudaAsync had ERRORS

The original test fails to run, but exits with a non-error status:

$ $CMSSW_FULL_RELEASE_BASE/test/$SCRAM_ARCH/alpakaTestKernelCudaAsync
No devices available on the platform alpaka_cuda_async, the test will be skipped.
No devices available on the platform alpaka_cuda_async, the test will be skipped.
No devices available on the platform alpaka_cuda_async, the test will be skipped.
===============================================================================
test cases: 3 | 3 passed
assertions: - none -

I though SCRAM would not run the CUDA tests if cudaIsEnabled fails ?

@makortel
Copy link
Contributor

makortel commented Nov 7, 2023

I though SCRAM would not run the CUDA tests if cudaIsEnabled fails ?

That is the default behavior, but for GPU_X IBs and the PR GPU tests all tests depending on cuda are explicitly run (this is visible e.g. in the PR GPU unit test log

+ eval USER_UNIT_TESTS=cuda timeout 7320 scram b -v -k -j 4 unittests
++ USER_UNIT_TESTS=cuda
++ timeout 7320 scram b -v -k -j 4 unittests

https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-139b5e/35643/gpuUnitTests/log.txt
)

So the node where the GPU tests were run did not have a GPU (setup compatible with CUDA 12.2?) after all?

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 7, 2023 via email

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 7, 2023

@makortel

  1. do you think the way to write the test is OK, or we should have a different behaviour?
  2. do the other changes look good?

@makortel
Copy link
Contributor

makortel commented Nov 7, 2023

Looks ok to me

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 7, 2023

+heterogenous

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 7, 2023

+heterogeneous

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 7, 2023

This pull request is fully signed and it will be integrated in one of the next master IBs (but tests are reportedly failing). This pull request will now be reviewed by the release team before it's merged. @antoniovilela, @sextonkennedy, @rappoccio (and backports should be raised in the release meeting by the corresponding L2)

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 14, 2023

please test

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-43205/37682

@cmsbuild
Copy link
Contributor

Pull request #43205 was updated. @fwyzard, @makortel can you please check and sign again.

@cmsbuild
Copy link
Contributor

+1

Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-139b5e/35811/summary.html
COMMIT: 8c859bc
CMSSW: CMSSW_14_0_X_2023-11-14-1100/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/43205/35811/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 256 lines from the logs
  • Reco comparison results: 136 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3363028
  • DQMHistoTests: Total failures: 2392
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3360614
  • DQMHistoTests: Total skipped: 22
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 214 log files, 167 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

GPU Comparison Summary

Summary:

  • No significant changes to the logs found
  • Reco comparison results: 48 differences found in the comparisons
  • DQMHistoTests: Total files compared: 3
  • DQMHistoTests: Total histograms compared: 39740
  • DQMHistoTests: Total failures: 1835
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 37905
  • DQMHistoTests: Total skipped: 0
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 2 files compared)
  • Checked 8 log files, 10 edm output root files, 3 DQM output files
  • TriggerResults: no differences found

@fwyzard
Copy link
Contributor Author

fwyzard commented Nov 14, 2023

+heterogeneous

@fwyzard fwyzard changed the title Add the blocks_with_stride and elements_in_block ranges Introduce new utilities for writing Alpaka kernels Nov 14, 2023
@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @rappoccio, @antoniovilela, @sextonkennedy (and backports should be raised in the release meeting by the corresponding L2)

@antoniovilela
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 18bde74 into cms-sw:master Nov 15, 2023
@fwyzard fwyzard deleted the implement_blocks_with_stride branch January 30, 2024 11:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants